100 research outputs found

    Recognition times for 54 thousand Dutch words : data from the Dutch crowdsourcing project

    Get PDF
    We present a new database of Dutch word recognition times for a total of 54 thousand words, called the Dutch Crowdsourcing Project. The data were collected with an Internet vocabulary test. The database is limited to native Dutch speakers. Participants were asked to indicate which words they knew. Their response times were registered, even though the participants were not asked to respond as fast as possible. Still, the response times correlate around .7 with the response times of the Dutch Lexicon Projects for shared words. Also results of virtual experiments indicate that the new response times are a valid addition to the Dutch Lexicon Projects. This not only means that we have useful response times for some 20 thousand extra words, but we now also have data on differences in response latencies as a function of education and age. The new data correspond better to word use in the Netherlands

    A plea for more interactions between psycholinguistics and natural language processing research

    Get PDF
    A new development in psycholinguistics is the use of regression analyses on tens of thousands of words, known as the megastudy approach. This development has led to the collection of processing times and subjective ratings (of age of acquisition, concreteness, valence, and arousal) for most of the existing words in English and Dutch. In addition, a crowdsourcing study in the Dutch language has resulted in information about how well 52,000 lemmas are known. This information is likely to be of interest to NLP researchers and computational linguists. At the same time, large-scale measures of word characteristics developed in the latter traditions are likely to be pivotal in bringing the megastudy approach to the next level

    Percepcija tipičnosti u leksikonu: tipičnost oblika riječi, leksička gustoća i morfonotaktička ograničenja

    Get PDF
    The extent to which a symbolic timeā€“series (a sequence of sounds or letters) is a typical word of a language, referred to as WORDLIKENESS, has been shown to have effects in speech perception and production, reading proficiency, lexical development and lexical access, shortā€“term and longā€“term verbal memory. Two quantitative models have been suggested to account for these effects: serial phonotactic probabilities (the likelihood for a given symbolic sequence to appear in the lexicon) and lexical density (the extent to which other words can be obtained from a target word by changing, deleting or inserting one or more symbols in the target). The two measures are highly correlated and thus easy to be confounded in measuring their effects in lexical tasks. In this paper, we propose a computational model of lexical organisation, based on Selfā€“Organising Maps with Hebbian connections defined over a temporal layer (TSOMs), providing a principled algorithmic account of effects of lexical acquisition, processing and access, to further investigate these issues. In particular, we show that (morphoā€“)phonotactic probabilities and lexical density, though correlated in lexical organisation, can be taken to focus on different aspects of speakersā€™ word processing behaviour and thus provide independent cognitive contributions to our understanding of the principles of perception of typicality that govern lexical organisation.Pokazano je da stupanj do kojeg je određeni simbolički vremenski slijed (slijed zvukova ili slova) tipična riječ u jeziku, odnosno TIPIčNOST OBLIKA RIJEčI, ima učinaka u proizvodnji i percepciji govora, uspjeÅ”nosti čitanja, leksičkom razvoju i pristupu leksemima te kratkotrajnoj i dugotrajnoj verbalnoj memoriji. Predložena su dva kvantitativna modela kako bi se objasnili navedeni učinci: serijalne fonotaktičke vjerojatnosti (vjerojatnost pojavljivanja određenog simboličkog slijeda u leksikonu) i leksička gustoća (mjera do koje se druge riječi mogu proizvesti zamjenom, brisanjem ili umetanjem jednog ili viÅ”e simbola u ciljnu riječ). Te dvije mjere visoko koreliraju, zbog čega su teÅ”ko razdvojive pri mjerenju njihovih učinaka u leksičkim zadacima. U ovom radu predlažemo računalni model leksičke organizacije koji pruža sustavan algoritamski prikaz učinaka leksičkog usvajanja, obrade i pristupa kako bi se dodatno istražila ova pitanja. Taj se model temelji na samoorganizirajućim mapama s hebijanskim vezama definiranim preko vremenske razine (engl. TSOMs). Posebice pokazujemo da se (morfo-)fonotaktičke vjerojatnosti i leksička gustoća, iako korelirani u leksičkoj organizaciji, mogu shvatiti kao načini usredotočavanja na različite aspekte govornikova ponaÅ”anja pri obradi riječi i tako pružiti nezavisne kognitivne doprinose naÅ”em razumijevanju principa percepcije i tipičnosti koji upravljaju leksičkom organizacijom

    Percepcija tipičnosti u leksikonu: tipičnost oblika riječi, leksička gustoća i morfonotaktička ograničenja

    Get PDF
    The extent to which a symbolic timeā€“series (a sequence of sounds or letters) is a typical word of a language, referred to as WORDLIKENESS, has been shown to have effects in speech perception and production, reading proficiency, lexical development and lexical access, shortā€“term and longā€“term verbal memory. Two quantitative models have been suggested to account for these effects: serial phonotactic probabilities (the likelihood for a given symbolic sequence to appear in the lexicon) and lexical density (the extent to which other words can be obtained from a target word by changing, deleting or inserting one or more symbols in the target). The two measures are highly correlated and thus easy to be confounded in measuring their effects in lexical tasks. In this paper, we propose a computational model of lexical organisation, based on Selfā€“Organising Maps with Hebbian connections defined over a temporal layer (TSOMs), providing a principled algorithmic account of effects of lexical acquisition, processing and access, to further investigate these issues. In particular, we show that (morphoā€“)phonotactic probabilities and lexical density, though correlated in lexical organisation, can be taken to focus on different aspects of speakersā€™ word processing behaviour and thus provide independent cognitive contributions to our understanding of the principles of perception of typicality that govern lexical organisation.Pokazano je da stupanj do kojeg je određeni simbolički vremenski slijed (slijed zvukova ili slova) tipična riječ u jeziku, odnosno TIPIčNOST OBLIKA RIJEčI, ima učinaka u proizvodnji i percepciji govora, uspjeÅ”nosti čitanja, leksičkom razvoju i pristupu leksemima te kratkotrajnoj i dugotrajnoj verbalnoj memoriji. Predložena su dva kvantitativna modela kako bi se objasnili navedeni učinci: serijalne fonotaktičke vjerojatnosti (vjerojatnost pojavljivanja određenog simboličkog slijeda u leksikonu) i leksička gustoća (mjera do koje se druge riječi mogu proizvesti zamjenom, brisanjem ili umetanjem jednog ili viÅ”e simbola u ciljnu riječ). Te dvije mjere visoko koreliraju, zbog čega su teÅ”ko razdvojive pri mjerenju njihovih učinaka u leksičkim zadacima. U ovom radu predlažemo računalni model leksičke organizacije koji pruža sustavan algoritamski prikaz učinaka leksičkog usvajanja, obrade i pristupa kako bi se dodatno istražila ova pitanja. Taj se model temelji na samoorganizirajućim mapama s hebijanskim vezama definiranim preko vremenske razine (engl. TSOMs). Posebice pokazujemo da se (morfo-)fonotaktičke vjerojatnosti i leksička gustoća, iako korelirani u leksičkoj organizaciji, mogu shvatiti kao načini usredotočavanja na različite aspekte govornikova ponaÅ”anja pri obradi riječi i tako pružiti nezavisne kognitivne doprinose naÅ”em razumijevanju principa percepcije i tipičnosti koji upravljaju leksičkom organizacijom

    Corpus linguistics

    Get PDF
    The first comprehensive guide to research methods and technologies in psycholinguistics and the neurobiology of language Bringing together contributions from a distinguished group of researchers and practitioners, editors Annette M. B. de Groot and Peter Hagoort explore the methods and technologies used by researchers of language acquisition, language processing, and communication, including: traditional observational and behavioral methods; computational modelling; corpus linguistics; and virtual reality. The book also examines neurobiological methods, including functional and structural neuroimaging and molecular genetics. Ideal for students engaged in the field, Research Methods in Psycholinguistics and the Neurobiology of Language examines the relative strengths and weaknesses of various methods in relation to competing approaches.Ā  It describes the apparatus involved, the nature of the stimuli and data used, and the data collection and analysis techniques for each method. Featuring numerous example studies, along with many full-color illustrations, this indispensable text will help readers gain a clear picture of the practices and tools described.Ā  Brings together contributions from distinguished researchers across an array of related disciplines who explain the underlying assumptions and rationales of their research methods Describes the apparatus involved, the nature of the stimuli and data used, and the data collection and analysis techniques for each method Explores the relative strengths and weaknesses of various methods in relation to competing approaches Features numerous real-world examples, along with many full-color illustrations, to help readers gain a clear picture of the practices and tools describe

    Assessing the Usefulness of Google Booksā€™ Word Frequencies for Psycholinguistic Research on Word Processing

    Get PDF
    In this Perspective Article we assess the usefulness of Google's new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project (Balota et al., 2007) than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles. Further analyses indicate that word frequencies derived from recent books (published after 2000) are better predictors of word processing times than frequencies based on the full corpus, and that word frequencies based on fiction books predict word processing times better than word frequencies based on the full corpus. The most predictive word frequencies from Google still do not explain more of the variance in word recognition times of undergraduate students and old adults than the subtitle-based word frequencies

    Which words do English non-native speakers know? New supernational levels based on yes/no decision

    Get PDF
    To have more information about the English words known by second language (L2) speakers, we ran a large-scale crowdsourcing vocabulary test, which yielded 17 million useful responses. It provided us with a list of 445 words known to nearly all participants. The list was compared to various existing lists of words advised to include in the first stages of English L2 teaching. The data also provided us with a ranking of 61,000 words in terms of degree and speed of word recognition in English L2 speakers, which correlated r = .85 with a similar ranking based on native English speakers. The L2 speakers in our study were relatively better at academic words (which are often cognates in their mother tongue) and words related to experiences English L2 students are likely to have. They were worse at words related to childhood and family life. Finally, a new list of 20 levels of 1,000 word families is presented, which will be of use to English L2 teachers, as the levels represent the order in which English vocabulary seems to be acquired by L2 learners across the world
    • ā€¦
    corecore